Feedback arcs and node hierarchy in directed networks
Zhao Jin-Hua1, 3, Zhou Hai-Jun1, 2, †
Key Laboratory of Theoretical Physics, Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing 100190, China
School of Physical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
Department of Applied Science and Technology, Politecnico di Torino, 10129 Torino, Italy

 

† Corresponding author. E-mail: zhouhj@itp.ac.cn

Abstract

Directed networks such as gene regulation networks and neural networks are connected by arcs (directed links). The nodes in a directed network are often strongly interwound by a huge number of directed cycles, which leads to complex information-processing dynamics in the network and makes it highly challenging to infer the intrinsic direction of information flow. In this theoretical paper, based on the principle of minimum-feedback, we explore the node hierarchy of directed networks and distinguish feedforward and feedback arcs. Nearly optimal node hierarchy solutions, which minimize the number of feedback arcs from lower-level nodes to higher-level nodes, are constructed by belief-propagation and simulated-annealing methods. For real-world networks, we quantify the extent of feedback scarcity by comparison with the ensemble of direction-randomized networks and identify the most important feedback arcs. Our methods are also useful for visualizing directed networks.

1. Introduction

Directed networks are formed by nodes and arcs (i.e., directed links) pointing from one node to another. They are ubiquitous in biological and technological systems; for instance, neurons in the brain rely on directed synaptic connections to form an information-processing network,[1] and cell regulatory networks contain directed interactions between genes, proteins, and other small molecules.[24] Structural properties of directed networks at different scales have been studied in the literature for many years, especially on network small motifs,[4,5] mesoscopic communities,[6,7] strongly connected components,[8,9] and network hierarchical structures.[1014] A directed network can easily be decomposed into a set of strongly connected components (SCCs). If one represents each SCC by a single vertex and neglects its internal connections, then at this coarse-grained level the directed network is simply a directed acyclic graph and there is no directed cycles among the SCCs.[9,13] Each SCC is itself a maximal subnetwork formed by some nodes and the arcs between them, and any node can reach and be reached by any other node of the same SCC through at least one directed path. Directed cycles are usually abundant in the large SCCs (each of which contains many nodes and arcs), and they cause strong feedback effect and make the information-processing dynamics in the network highly complex.[15,16]

The hierarchical structures within large SCCs of directed networks have not yet been fully investigated except for a few earlier efforts (e.g., Refs. [12], [14], and [16]–[18]). Due to the cyclic nature of a SCC, it appears at first sight to be quite ambiguous or even meaningless to order its nodes in a particular way and to define an intrinsic flow direction.[13] However in this paper we show that the arcs that are most vital for feedback interactions can be identified by collectively considering all the directed cycles of the network. We take an optimization approach based on the so-called principle of minimum feedback,[12] which defines the minimum feedback arc set problem. An integer hierarchical level is assigned to each node of the input network and the resulting level configuration of all the nodes is called a node hierarchy. The node levels in this hierarchy are optimized by two efficient physics-inspired algorithms, SA and BPD, which minimize the total number of feedback arcs (defined as those pointing from lower-level nodes to higher-level nodes).

Given a real-world directed network, we can construct many near-minimum feedback arc sets by repeatedly running the SA or BPD algorithm. The sizes of these constructed sets are very close to each other and are much smaller than the total number of arcs in the network. We also find that, while most of the arcs of the network never appear in any of these feedback arc sets, a few of them appear in almost all of them. As a concrete example, for the Florida food web[19] formed by 128 nodes and 2106 arcs, only six of the arcs need to be classified as feedbacks, (Fig. 1) which is much lower than the expected number of 601 feedback arcs in a direction-randomized network. Our algorithms reveal that two arcs of the food-web network are present in all the minimum feedback arc sets. Similar results are obtained for other real-world networks.

Fig. 1. (color online) An optimal node hierarchy for the Florida food web[19] and the corresponding feedback arcs. The whole network has 128 nodes and 2106 arcs, and its largest strongly connected component contains 103 nodes and 1579 arcs (for clarity only this component is shown). The nodes (black dots) are arranged to 22 hierarchical levels starting from level 0 at the bottom. Gray solid lines are feedforward arcs (pointing from higher-level nodes to lower-level nodes), and red dashed lines are feedback arcs (pointing from lower-level nodes to higher-level nodes). Each blue dotted line represents a pair of opposite arcs between two nodes.

By distinguishing feedforward arcs and feedback arcs for a real-world directed network, our work help to reveal the hidden principal direction of flows in the network and the hierarchical organization of the nodes within the strongly connected network components. For biological networks, the identified most important feedback arcs might serve as optimal targets of intervening the system.[20] Our algorithms are also useful for network visualization.[21] The source codes of these algorithms will be publicly available to facilitate analyzing and visualizing biological, technological, and social networks.

The minimum feedback principle serves as a useful framework for understanding the structural properties of directed networks. Since feedback interactions interfere with the default dynamics of a real-world complex system, it should be desirable to delete the redundant feedbacks to increase efficency and decrease cost. We therefore expect the minimum feedback principle to be evolutionarily plausible for real-world complex networks. Even if the feedback minimization is not a major driving force for network evolution and adaptation, the technical results of this work still hold.

2. Node hierarchy and feedback arcs

Given a directed network G of N nodes and M arcs, with arc density , we introduce a node hierarchy to partially order the N nodes . The level of each node i takes a nonnegative integer value , and level 0 is the lowest. A node j at positive level hj must have at least one outgoing arc to a node k at one level below (i.e., ) to justify its level. Under these level constraints, our goal is to construct an optimal node hierarchy which agrees with most of the arcs, so that the total number of arcs from higher-level nodes to lower-level nodes reaches the global maximum value.

The node hierarchy problem is essentially equivalent to the feedback arc set problem, a fundamental and famous non-deterministic polynomial hard (NP-hard) problem in computer science[22] (Appendix A). A set Λ of arcs is regarded as a feedback arc set (FAS) if it intersects with every directed cycle of the network. Notice that if all the arcs of a FAS are deleted, the remaining network contains no directed cycle. A FAS is a minimal one if any of its proper subset is no longer a FAS; and it is a minimum one if its cardinality is the smallest among all the feedback arc sets. Given a node hierarchy of network G, the set formed by all the arcs with is a FAS. On the other hand, a unique node hierarchy can be constructed for any FAS by first deleting all the arcs of this set from the network and assigning the lowest level 0 to all the nodes which have no outgoing arc, followed by iteratively assigning the level to all the remaining nodes which have outgoing arcs only to nodes at lower levels. Indeed there is a one-to-one correspondence between node hierarchies and the so-called neat feedback arc sets (Appendix A). All the minimal and minimum feedback arc sets (and some special non-minimal ones) are neat, and therefore an optimal node hierarchy is equivalent to a minimum FAS.

3. Mean field theory and belief propagation

We can treat the node hierarchy problem as a statistical mechanical system. Let us define the energy of an arc as for and for . The total energy of hierarchy is then the sum of arc energies

We write down the following equilibrium partition function Z to combine the effects of energy and level constraints:

Here ψi is the Boltzmann factor of node i due to its level constraint: if or i has an outgoing arc to a node j at level ; otherwise . The Boltzmann factor of arc is if its energy is zero (); otherwise with the inverse temperature β being an adjustable parameter. Notice that each node hierarchy contributes a weight to Z. At sufficiently large values of β, the node hierarchies with the global minimum energy value (i.e., the optimal node hierarchies) will have overwhelming contributions to the partition function Z.

We have solved model (2) by the replica-symmetric (RS) cavity method developed in the spin glass research field[2328] (Appendix B). Due to the strong level constraints, the mean-field equations of this RS theory are very complicated and are computationally inefficient.

Fortunately, the equivalence between the optimal node hierarchy problem and the minimum FAS problem means that we can obtain a near-optimal node hierarchy by first constructing a near-minimum FAS. For the latter task, the level constraints of Eq. (2) are not necessary, so we can drop them by setting the Boltzmann factors of all the nodes i to be . The RS mean-field theory for this relaxed model is much more convenient for numerical treatment (Appendix C). This simplified theory estimates the probability of arc being a feedback arc to be

where the integer D restricts the level of each node i to be to compensate for the removed level constraints; the function denotes the probability that node j will be at level hj if node is absent. The self-consistent belief propagation (BP) equation for this cavity probability is
where and ; and is the subset of with being excluded, similarly for .

We can iterate the BP equation (4) on the network G at a fixed large value of D (e.g., D = 200) and different values of β. The mean fraction ρ of feedback arcs at inverse temperature β is estimated by

Based on Eqs. (3) and (4), a belief-propagation–guided decimation (BPD) algorithm is also implemented to construct near-minimum feedback sets. Briefly speaking, at each decimation step a tiny fraction of arcs with the largest estimated values are deleted from the network G; then G is further simplified by deleting all the nodes which have no incoming or outgoing arc; then equation (4) is iterated a small number of times and the value of for each remaining arc is updated. Details of the BPD algorithm can be found in Appendix D.

4. Simulated annealing

Let us represent an N-node permutation as a column vector with and if . Another way of simplifying the level constraints of Eq. (2) is to set the level hi of each node i to be its vertical position in . A most convenient way of permutating the nodes to reduce the total arc energy is simulated annealing (SA).[29] This method has been successfully applied on the directed and undirected feedback vertex set problems.[28,30,31] For the present FAS problem we follow the simple recipe of Ref. [30] (Appendix E). Starting from an initial random permutation and an initial low inverse temperature β, at each time step two rejection-free updating processes are performed: 1) an upward arc with is chosen among all such arcs with probability proportional to and node i is moved to be immediately above node j in permutation , where is the increase in the number of upward arcs caused by this move; and 2) an upward arc is chosen among all such arcs with probability proportional to and is moved to be immediately below node in , where again is the increase in the number of upward arcs caused by this move. After such time steps (e.g., or even larger) the inverse temperature is increased to (e.g., ). The search process terminates at a sufficiently large value of β.

5. Results on random network instances

We first test the BPD and SA algorithms on directed Erdös–Rényi (ER), directed regular random (RR) and directed scale-free (SF) random networks.[32,33] Both ER and RR networks are homogenous, while SF networks are quite heterogeneous in that some nodes have a lot of attached arcs (Appendix F). As the arc directions are completely random, no intrinsic flow direction should exist in these artificial networks. Our motivation here is to check whether near-minimum feedback arc sets can be achieved by BPD and SA.

We find that BPD and SA perform almost equally good on all the heterogeneous (SF) random networks and on the homogeneous (ER and RR) networks of arc density ; the fractions ρ of feedback arcs in the constructed FAS solutions are very close to the predicted values by the RS mean-field theory, indicating that nearly optimal solutions are indeed achieved (Fig. 2). BPD and SA greatly outperform the local degree-based heuristic (DH) which recursively deletes the arc with the highest value of from the network to destroy all directed cycles,[34] with the in-degree and out-degree being, respectively, the number of incoming and outgoing arcs of node i.

Fig. 2. (color online) Numerical results for random directed networks. α, arc density; ρ, fraction of feedback arcs. Algorithmic results of the local DH (circles), BPD (pluses), and SA (crosses) are compared with the predictions of the RS mean-field theory (triangles). Level upper-bound D = 200 and inverse temperature for the BPD algorithm and the RS theory. Each data point is the average over 40 network instances of size ; standard deviation (not shown) is less than . Four ensembles of random networks are considered: (a) Erdös–Rényi (ER); (b) regular random (RR); (c) scale-free static (SFS[33]) with in- and out-degree exponents and ; and (d) scale-free configurational (SFC[32]) with in-degree exponent and different out-degree exponents and minimum in- and out-degree and in- and out-degree upper-bound .

The SA algorithm slightly outperforms BPD for directed ER and RR networks of arc density . For ER networks of arc density α = 5.0, the typical fraction ρ of feedback arcs in solutions constructed by SA has the value , while the corresponding value for the BPD-obtained solutions is . We can improve the performance of BPD to a small extent by choosing a larger value D of level upper-bound, but the computation cost increases linearly with D. To further boost the performance of the BPD algorithm and beat the SA algorithm, we need to design a better statistical physics model for the feedback arc set problem. We plan to explore this challenging issue in a future paper.

In terms of computation time, the BPD and SA algorithms both are efficient on relatively small random directed network instances. They usually reach satisfactory solutions within one or two hours when applied on networks of size vertices and arc densities . For large network instances, however, the BPD algorithm is much faster than the SA algorithm. For example, given a RR network of size and arc density α = 5, the SA algorithm needs a computing time of 53.2 hours to construct a FAS solution while the BPD algorithm only needs a much shorter time of 21.5 hours (the test was performed on a desktop computer with Intel-6300 1.86 GHz CPU and 2 GB memory).

6. Results on real-world network instances

As a demonstration of practical applications, we now apply BPD and SA on a small set of representative real-world directed networks (Table 1).

Table 1.

Solving the node hierarchy problem for real-world networks. N, node number; M, arc number; , number of simple arcs (which have no opposite counterpart); , number of simple feedback arcs; and , expected number of simple feedback arcs in a direction-randomized network and its standard deviation; R, scarcity extent of feedback arcs. Simulation results are all obtained by the SA algorithm.

.

Regulatory: the epidermal growth factor receptor (EGFR) signal transduction network,[3,15] with N = 61 nodes and M = 112 arcs. Each node represents a molecular species such as kinases, phosphatase, and ions; each arc represents a directed regulatory interaction between two molecular species.

Food web: the Florida Bay ecosystem network,[19] containing N = 128 nodes and M = 2106 arcs. Each node represents a species (such as bacteria, zooplankton, shrimp) or a molecular type such as particular organic carbon, and each directed arc represents transfer of biomass between two kinds of species or molecules.

Neural: the neural network of the nematode C. elegans,[5] containing N = 297 nodes and M = 2359 arcs. Each node represents a neural cell and each arc represents a directed connection between two neurons.

Circuit: the electronic sequential logic circuit network EC-s838,[5] containing N = 512 nodes and M = 819 directed connections.

Metabolic: the metabolic network of the nematode C. elegans,[10] with N = 1469 nodes and M = 3447 arcs. Each node represents a chemical molecule or an enzyme, and each arc means that a given molecule participates in a particular enzyme-catalyzed reaction or is produced by this reaction.

Wiki-Vote: the network of who-votes-on-whom among the Wikipedia administrators,[35] containing N = 7115 nodes and M = 103689 arcs.

P2P-share: the Gnutella peer-to-peer file sharing network,[36] containing N = 62586 nodes and M = 147892 arcs. Each node represents a computer server and each arc represents directed file transfer between two servers.

For these real-world network instances, we again find that the feedback arc sets contructed by BPD and SA are of very similar sizes. It is very likely that near-optimal FAS solutions have been achieved by these two algorithms. The SA algorithm and BPD perform equally good on the four small network instances, but SA slightly outperforms BPD on the three large network instances (Metabolic, Wiki-Vote, P2P-share). We list in Table 1 the results obtained by a single running of the SA algorithm on the examined real networks, where and respectively denote the total number of simple arcs and simple feedback arcs (excluding all the bi-directional arcs).

For each examined real-world network we also generate 96 replicas with the same connectivity pattern but completely randomized directions of all the simple arcs, and apply SA on them to obtain the expected number of simple feedback arcs and its standard deviation . The scarcity R of feedback arcs in the original network is then quantified as

This quantity has a clear statistical meaning. A large positive value of R suggests that the number of feedback arcs in the original network is significantly lower than the expected number of feedback arcs in a direction-randomized network. Similarly, a highly negative R value suggests that feedback arcs are significantly more abundant in the original network than in a direction-randomized network.

As Table 1 reveals, feedback arcs are very rare in the Florida food web,[19] the C. elegans neural network,[5] and social networks Wiki-Vote[35] and P2P-share,[36] which all have very large positive R values. Reducing the number of feedback connections might enhance the efficiency of information processing in neural and social networks. On the other hand, feedback arcs are strongly enriched in the C. elegans metabolic network,[10] which has a highly negative R value. It may be necessary to have an abundant number of feedback connections to finely regulate the concentrations of cellular molecules.

When we repeatedly run the SA or BPD algorithm on the same real-world network instance, we find that the output feedback arc sets are usually not identical although their sizes are almost equal. Most importantly, we find that most of the arcs in the network never appear in any of these constructed feedback arc sets, but some arcs are present in almost all these sets (Fig. 3). These results strongly indicate that the arcs in a real-world network have very different significance in terms of the feedback role, and our SA and BPD algorithms can identify a small set of most important feedback arcs.

Fig. 3. (color online) Rank plot on the frequency (probability) of each arc being a feedback arc. The arcs are ranked in decreasing order according to its frequency. The results are obtained by running the SA algorithm independently for 200 times on the same input network instance. (a) Regulatory network (FAS cardinality: 7) and Food web network (FAS cardinality: 6). (b) Neural network (FAS cardinality, mean and standard deviation: ). (c) Circuit network (FAS cardinality: 32±1). (d) Metabolic network (FAS cardinality: 556±2). (e) Wiki-Vote network (FAS cardinality: 3038±2). (f) P2P-share network (FAS cardinality: 2266±6).

Figure 3 indicates that the near-optimal FAS solutions obtained by SA or BPD are considerably degenerate. We expect the true optimal FAS solutions will also be degerate. An important reason for this degeneracy is the presence of many long directed strings (non-branched paths) formed by vertices whose in- and out-degree both are unity. If one part of a directed cycle is such a non-branched path, there are many degenerate ways of breaking this cycle. Notice that the directed non-branched paths do not affect the overall node hierarchy of the network nor the principal flow direction of the network. We can reduce this degeneracy by replacing each directed non-branched path from one node i to another node j by a single coarse-grained arc pointing from i to j.

After the feedback probability for every arc of a real-world network has been computed (through repeatedly running SA or BPD or, more efficiently, through employing Eq. (3) and BP iteration), the feedforward part (the backbone[12]) of the network can easily be constructed by checking every arc of the network in increasing order of the feedback probability and adding it to the backbone if no directed cycle will be formed.

7. Conclusion and outlook

In this paper, we introduced the optimal node hierarchy problem, which is essentially equivalent to the minimum feedback arc set problem, and presented two physics-based algorithms to efficiently solve this problem for random and real-world directed networks. Our BPD and SA algorithms are capable of revealing the hidden hierarchical structure and the principal flow direction of a real-world directed network. Our methods can also be used to discover a small number of arcs which are involved most significantly in feedback interactions. We found that feedback interactions are extremely supressed in some real-world networks.

The methods of this work may have wide practical applications in studies of biological, technological, and social networks and in network engineering. For example, after the intrinsic flow direction in the network has been determined, it may become much more easier to design efficient arc-deletion or arc-addition strategies to improve the functionality of the network and to make it more robust against random failures or intentional attacks. The key feedback arcs identified by our algorithms may serve as optimal targets of intervening the dynamical processes on the network.

A natural extension of the present work is to consider optimal ways of cutting long directed cycles to dismantle a directed network. Similar to the proposal of optimally dismantling an undirected network,[37,38] we may iteratively delete the arcs that are predicted to be most important for long-range feedback interactions to break the original directed network down into many small strongly connected components. Detailed numerical study on this important network optimization problem will be reported in a separate paper.

Directed cycles are large-scale structural aspects of a directed network. They cause complicated global constraints to the node hierarchy and FAS problems. Further efforts are needed to improve the theoretical models and the BPD algorithm of this paper. Indeed the two spin glass models of the present paper still have major shortcomings. Firstly, each node i of the network can take many different level states hi, which considerably slows down the numerical computation. Secondly, the predicted minimum cardinalities of feedback arc sets by the two models differ noticably with each other and with the algorithmic results of BPD and SA. Thirdly, the associated BPD algorithms of the two models perform worse than the SA algorithm on homogeneous random networks of relative large arc densities. We hope these issues will be overcome in the near future by a refined statistical physics model of the minimum feedback arc set problem.

Reference
[1] White J G Southgate E Thomson J N Brenner S 1986 Phil. Trans. R. Soc. Lond. 314 1
[2] Li F Long T Lu Y Ouyang Q Tang C 2004 Proc. Natl. Acad Sci. USA 101 4781
[3] Oda K Matsuoka Y Funahashi A Kitano H 2005 Mol. Syst. Biol. 1 2005.0010
[4] Alon U 2007 Nature Rev. Genetics 8 450
[5] Milo R Shen-Orr S Itzkovitz S Kashtan N Chklovskii D Alon U 2002 Science 298 824
[6] Leicht E A Newman M E J 2008 Phys. Rev. Lett. 100 118703
[7] Fortunato S 2010 Phys. Rep. 486 75
[8] Tarjan R E 1972 SIAM J. Comput. 1 146
[9] Dorogovtsev S N Mendes J F F Samukhin A N 2001 Phys. Rev. 64 025101(R)
[10] Jeong H Tombor B Albert R Oltvai Z N Barabási A L 2000 Nature 407 651
[11] Ravasz E Somera A L Mongru D A Oltvai Z N Barabási A L 2002 Science 297 1551
[12] Lan Y Mezić I 2011 BMC Syst. Biol. 5 37
[13] Corominas-Murtra B Goñi J Solé R V Rodríguez-Caso C 2013 Proc. Natl. Acad Sci. USA 110 13316
[14] Domínguez-García V Pigolotti S Muñoz M A 2014 Sci. Rep. 4 7497
[15] Fiedler B Mochizuki A Kurosawa G Saito D 2013 J. Dynam. Differ. Equat. 25 563
[16] Xu J Lan Y 2015 PLoS ONE 10 e0125886
[17] Ispolatov I Maslov S 2008 BMC Bioinformatics 9 424
[18] Gupte M Shankar P Li J Muthukrishnan S Iftode L 2011 Proceedings of the twentieth International World Wide Web Conference March 28–April 01, 2011 Hyderabad, India 557
[19] Ulanowicz R E Bondavalli C Egnotovich M S 1998 Network analysis of trophic dynamics in south Florida ecosystem, fy 97: The Florida Bay ecosystem Technical report Chesapeake Biological Laboratory Solomons 1998
[20] Liu Y Y Barab’si A L 2016 Rev. Mod. Phys. 88 035006
[21] Eades P Sugiyama K 1990 J. Information Processing 13 424
[22] Garey M Johnson D S 1979 Computers and Intractability: A Guide to the Theory of NP-Completeness San Francisco Freeman
[23] Mézard M Montanari A 2009 Information, Physics, and Computation New York Oxford Univ. Press
[24] Mézard M Parisi G 2001 Eur. Phys. J. 20 217
[25] Bayati M Borgs C Braunstein A Chayes J Ramezanpour R Zecchina R 2008 Phys. Rev. Lett. 101
[26] Altarelli F Braunstein A Dall’Asta L Zecchina R 2013 J. Stat. Mech.: Theor. Exp. 2013 P09011
[27] Guggiola A Semerjian G 2015 J. Stat. Phys. 158 300
[28] Zhou H J 2016 J. Stat. Mech.: Theor. Exp. 2016 073303
[29] Kirkpatrick S Gelatt C D Jr. Vecchi M P 1983 Science 220 671
[30] Galinier P Lemamou E Bouzidi M W 2013 J. Heuristics 19 797
[31] Qin S M Zhou H J 2014 Eur. Phys. J. 87 273
[32] Dorogovtsev S N Mendes J F F 2002 Adv. Phys. 51 1079
[33] Goh K I Kahng B Kim D 2001 Phys. Rev. Lett. 87 278701
[34] Pardalos P M Qian T B Resende M G C 1998 J. Combin. Optim. 2 399
[35] Leskovec J Huttenlocher D Kleinberg J 2010 Proceedings of the 19th International Conference on World Wide Web April 26-30, 2010 Raleigh, NC, USA 641
[36] Ripeanu M Foster I Iamnitchi A 2002 IEEE Internet Comput. 6 50
[37] Mugisha S Zhou H J 2016 Phys. Rev. 94 012305
[38] Braunstein A Dall’Asta L Semerjian G Zdeborová L 2016 Proc. Natl. Acad. Sci. USA 113 12368